Difference in Difference models

Difference in differences: basic intuition

  • Slightly tweaking the traditional counterfactual, DiD asks: “what would happen to the trend for this unit had it never received the treatment?”

    • How would your rate of growth change if you ate more vegetables?

    • What would happen to inflation if the federal reserve lowered interest rates?

    • How would the Arab Spring have unfolded if participants lacked access to cell phones and social media?

Difference in differences model

  • Probably the oldest non-experimental method of causal inference (likely dates back to 1855)

  • Units must be observed before and after the “treatment”, so most commonly applied to panel data.

  • If assumptions are met, can control for both observed and unobserved confounding.

John Snow and the London Cholera epidemic

  • 1854 Broad Street Cholera outbreak killed over 600 people in a poor district of London. What caused the outbreak?

  • What causes Cholera in general?

  • What interventions work?

Miasma theory

The immediate and chief cause of diseases is atmospheric impurity arising from decomposing remnants of the substances used for food and from the impurities given out from their own bodies. (Neil Arnott, 1844)

Snow, however, found the initial outbreak was clustered around a single water pump on Broad Street. (73 of 83 initial deaths nearer to the Broad Street pump than any other)

Snow’s map of Cholera outbreaks

The transmission of Cholera

Largely at Snow’s behest, the pump’s handle was removed, and the epidemic subsided, but does this tell us much? Outbreaks tend to subside!

Southwark and Vauxhall

  • Southwark and Vauxhall water company supplied 40,000+ homes from a reservoir that drew directly from the Thames

  • Supply had a well-established reputation for being…gross.

John Edwards “Sovereign of scented streams”

Lambeth waterworks

Lambeth waterworks, while it also drew from the Thames, moved their reservoir far upstream of the city in 1852.

The natural experiment

Water supply Cholera deaths, 1849, rate per 100,000 Cholera deaths, 1854, rate per 100,000
Southwark & Vauxhall Company only 1349 1466
Lambeth Company Only 847 193

The natural experiment

Note that the companies have different starting points (Lambeth was already cleaner even by 1849), but miasma theory might lead you to expect the same trend.

The natural experiment

If we can assume a parallel trend, then the relationship should look like this. The effect size, then, would be the difference between the counterfactual case and the observed case.

The effect of moving pumps

Water supply Cholera deaths, 1849, rate per 100,000 Cholera deaths, 1854, rate per 100,000 Difference in rates comparing 1854 to 1849, rate per 100,000
Southwark & Vauxhall Company only 1349 1466 118
Lambeth Company Only 847 193 −653
Difference-in-difference, Lambeth versus Southwark & Vauxhall 502 1273

−771

The difference in difference estimator

  • Answers the question “what would have happened to the treated units if they had not received the treatment” (average treatment effect on the treated or ATT)

    • i.e. “if Lambeth had not moved the reservoir upstream, there would have been a parallel increase in the number of cholera deaths among their customers”

    • But for [the treatment] the trends between treated and control units should be parallel

    • Does not require an assumption that observations are balanced on expected values of the outcome. Unobserved confounding only matters to the extent it impacts the trend.

      • Similar to fixed effects: all time-invariant characteristics are controlled.

Assumptions

  • Parallel trends: lines would be parallel but for the treatment

    • Most important (and often the most difficult to justify)
  • Exogeneity of treatment with respect to expected trends: treatment isn’t a response to baseline outcome or expected outcomes.

  • No spillover: untreated units aren’t impacted by treatment.

  • Stable groups: the before/after populations for each group are the same

    • For panel studies, this is guaranteed, but for repeated cross-sections this is a concern because people could leave or enter the groups at different times.

OLS as DiD

For a simple 2-group x 2-time period DiD model, we can get this entire thing from a fairly simple OLS model:

\[ \hat{Y} = B_0 + B_1 \text{Time} + B_2\text{Treated} + B_3\text{Time x Treated} \]

  1. \(B_0\) The average for the control group at \(T=0\)

  2. \(B_1\) The average for the control group at \(T=1\)

  3. \(B_2\) The difference between the treated and control units at \(T=0\)

  4. \(B_3\) The difference in slopes for the treated group compared to the control group \(T=1\)

Example

library(tidyverse)
df<-data.frame(
  "period" =factor(rep(c(0, 1), 2), labels=c("before", "after")),
  "group" = factor(rep(c(0, 1), each=2), labels=c("control", "treatment")),
  "deaths" = c(1349, 1466, 847, 193)
  )

model<-lm(deaths ~ period * group , data=df)

tidy(model)|>
  select(term, estimate)
termestimate
(Intercept)1.35e+03
periodafter117       
grouptreatment-502       
periodafter:grouptreatment-771       

Interpretation

In this setup, the interaction term represents our difference-in-difference estimate

termestimate
(Intercept)1.35e+03
periodafter117       
grouptreatment-502       
periodafter:grouptreatment-771       
Water supply Deaths 1849 Deaths 1854 1854 - 1849
S & V 1349 1466 118
Lambeth 847 193 −653
DiD 502 1273 −771

Two-Way Fixed Effects

The two-way FE estimate can be generalized to multiple groups/multiple periods by using a fixed effect for each group/time in place of the indicator for control vs. treatment cases:

\[ \hat{Y}_{gt} = \alpha_g + \gamma_t + \delta X_{gt} \] \[ \alpha_g = \text{Group Fixed Effect} \]

\[ \gamma_t = \text{Time Fixed Effect} \] \[ \delta_{gt} = \text{Post Treatment Indicator} \]

Card and Krueger 1994

  • Do minimum wage increases reduce employment? Reasonable expectation from classical economics, but empirical evidence is limited.
  • Collected employment data before and after a minimum wage increase in New Jersey and compared to data from the same time periods for Pennsylvania

Results (reproduced by Angrist and Krueger)

Including controls

  • Parallel trends may only hold conditionally on some observed characteristic

    • For instance, maybe states that increase the minimum wage are more likely to already have a low supply of labor which leads to different growth trends in the future. In that case, you might want to include a measure of the labor supply in a state.
  • Controls should be based on characteristics that do not change or were measured prior to treatment.

    • Card and Krueger include controls for region and franchise to account for compositional differences between PA vs NJ fast food workers.
  • In practice, this is as simple as adding an additional control to a regression.

Considerations

  • In general, it makes sense to just stick with the linear probability model, despite its flaws

  • Simplest setup is the pre vs. post treatment cross-sectional case.

  • Biggest assumption is the parallel trend. Visual examination can help, especially if you have observations for prior outcomes

Staggered exposure

Including multiple years and cases with staggered treatments can make it easier to justify the parallel trends assumption. “Eventually treated” units are often a more sensible comparison case for units that have been recently treated.

However, there’s an issue with “early treated” and “late treated” units being weighed differently in the results.

Staggered exposure

year countyid first.treat treated
2003 8001 2007 0
2004 8001 2007 0
2005 8001 2007 0
2006 8001 2007 0
2007 8001 2007 1
2003 8019 2007 0
2004 8019 2007 0
2005 8019 2007 0
2006 8019 2007 0
2007 8019 2007 1

Staggered exposure